Search CORE

8 research outputs found

Data Structure Lower Bounds for Document Indexing Problems

Author: Afshani Peyman
Nielsen Jesper Sindahl
Publication venue
Publication date: 01/01/2016
Field of study

We study data structure problems related to document indexing and pattern matching queries and our main contribution is to show that the pointer machine model of computation can be extremely useful in proving high and unconditional lower bounds that cannot be obtained in any other known model of computation with the current techniques. Often our lower bounds match the known space-query time trade-off curve and in fact for all the problems considered, there is a very good and reasonable match between the our lower bounds and the known upper bounds, at least for some choice of input parameters. The problems that we consider are set intersection queries (both the reporting variant and the semi-group counting variant), indexing a set of documents for two-pattern queries, or forbidden- pattern queries, or queries with wild-cards, and indexing an input set of gapped-patterns (or two-patterns) to find those matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016, 25 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Applications of incidence bounds in point covering problems

Author: Afshani Peyman
Berglin Edvin
Nielsen Jesper Sindahl
van Duijn Ingo
Publication venue
Publication date: 01/01/2016
Field of study

In the Line Cover problem a set of n points is given and the task is to cover the points using either the minimum number of lines or at most k lines. In Curve Cover, a generalization of Line Cover, the task is to cover the points using curves with d degrees of freedom. Another generalization is the Hyperplane Cover problem where points in d-dimensional space are to be covered by hyperplanes. All these problems have kernels of polynomial size, where the parameter is the minimum number of lines, curves, or hyperplanes needed. First we give a non-parameterized algorithm for both problems in O*(2^n) (where the O*(.) notation hides polynomial factors of n) time and polynomial space, beating a previous exponential-space result. Combining this with incidence bounds similar to the famous Szemeredi-Trotter bound, we present a Curve Cover algorithm with running time O*((Ck/log k)^((d-1)k)), where C is some constant. Our result improves the previous best times O*((k/1.35)^k) for Line Cover (where d=2), O*(k^(dk)) for general Curve Cover, as well as a few other bounds for covering points by parabolas or conics. We also present an algorithm for Hyperplane Cover in R^3 with running time O*((Ck^2/log^(1/5) k)^k), improving on the previous time of O*((k^2/1.3)^k).Comment: SoCG 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Audio Quality Assurance: An Application of Cross Correlation: Paper - iPRES 2012 - Digital Curation Institute, iSchool, Toronto

Author: Ammitzboll Jurik Bolette
Sindahl Nielsen Jesper
Publication venue: Digital Curation Institute, iSchool University of Toronto
Publication date: 01/11/2012
Field of study

We describe algorithms for automated quality assurance on content of audio files in context of preservation actions and access. The algorithms use cross correlation to compare the sound waves. They are used to do overlap analysis in an access scenario, where preserved radio broadcasts are used in research and annotated. They have been applied in a mi- gration scenario, where radio broadcasts are to be migrated for long term preservation. This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137)

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

A lower bound for jumbled indexing

Author: Afshani Peyman
Killmann Rasmus
Nielsen Jesper Sindahl
van Duijn Ingo
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

Crossref

VBN

Top-k Term-Proximity in Succinct Space

Author: Munro J. Ian
Navarro Gonzalo
Nielsen Jesper Sindahl
Shah Rahul
Thankachan Sharma V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

LetD={T1,T2,...,TD}be a collection ofDstring doc-uments ofncharacters in total, that are drawn from an alphabet setΣ= [σ]. Thetop-kdocument retrieval problemis to preprocessDintoa data structure that, given a query (P[1..p],k), can return thekdocu-ments ofDmost relevant to patternP. The relevance is captured usinga predefined ranking function, which depends on the set of occurrencesofPinTd. For example, it can be the term frequency (i.e., the num-ber of occurrences ofPinTd), or it can be the term proximity (i.e., thedistance between the closest pair of occurrences ofPinTd), or a pattern-independent importance score ofTdsuch as PageRank. Linear space andoptimal query time solutions already exist for this problem. Compressedand compact space solutions are also known, but only for a few rank-ing functions such as term frequency and importance. However, spaceefficient data structures for term proximity based retrieval have beenevasive. In this paper we present the first sub-linear space data structurefor this relevance function, which uses onlyo(n) bits on top of any com-pressed suffix array ofDand solves queries in timeO((p+k) polylogn)

Repositorio Académico de la Universidad de Chile

Top-k Term-Proximity in Succinct Space

Author: D Belazzougui
D Gusfield
G Benson
G Manzini
G Navarro
G Navarro
G Navarro
Gonzalo Navarro
J. Ian Munro
Jesper Sindahl Nielsen
M Berg de
P Ferragina
R Baeza-Yates
R Raman
Rahul Shah
S Büttcher
Sharma V. Thankachan
T Gagie
U Manber
W-K Hon
W-K Hon
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref